Quality improvement of PSOLA analysis-synthesis using partial zero-phase conversion

نویسندگان

  • Nobuaki Minematsu
  • Seiichi Nakagawa
چکیده

This paper discusses two issues of the quality improvement of F0 modified speech based upon PSOLA analysissynthesis. Previous studies[1][2] pointed out that the location of a window of PSOLA influences the quality of synthesized speech and one of them claimed that the center of a window should be located at a pitch pulse in source waveforms. However, pitch pulse detection sometimes fails due to undesired acoustic events. In this paper, several methods are experimentally examined to reduce pitch pulse detection errors. Even when the detection is done correctly, F0 modified re-synthesized speech sometimes causes “echoes” in the re-arranged waveforms. This is mainly caused by a pitch pulse with small sharpness or by that with two relatively high pulses, not pitch pulses, before and after it. To suppress the echoes with little loss of naturalness, partial zero/π-phase conversion is proposed here. Experiments show the high validity of the proposed methods in improving the quality of re-synthesized speech.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Flexible harmonic/stochastic speech synthesis

In this paper, our flexible harmonic/stochastic waveform generator for a speech synthesis system is presented. The speech is modeled as the superposition of two components: a harmonic component and a stochastic or aperiodic component. The purpose of this representation is to provide a framework with maximum flexibility for all kind of speech transformations. In contrast to other similar systems...

متن کامل

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

A new Analysis/Synthesis method, named SINOLA, based on “sinusoidal additive” and “OLA/PSOLA” synthesis, is proposed. It allows high quality transformation of both stationnary and non-stationnary parts of a signal. Time-frequency characterization and synthesis parameters estimation is done by a novel method based on spectrum peak shape distortions and time-frequency phase evolutions.

متن کامل

Diphone-Based Concatenative Speech Synthesis System for Mongolian

This paper describes the first Text-to-Speech (TTS) system for the Mongolian language, using the general speech synthesis architecture of Festival. The TTS is based on diphone concatenative synthesis, applying TD-PSOLA technique. The conversion process from input text into acoustic waveform is performed in a number of steps consisting of functional components. Procedures and functions for the s...

متن کامل

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

In this paper we present a new Analysis/Synthesis method named SINOLA, which benefits from both sinusoidal additive model and OLA/PSOLA method, and which allows adequate processing according to the inherent local characteristics of the signal. All the parameters of the models are derived at the same time from spectrum analysis. We propose an analytical formulation of a Complex Short-Time Spectr...

متن کامل

A new synthesis algorithm using phase information for TTS systems

New speech synthesis algorithms capable of flexible prosody (es pecially F0) modification are desired for a high quality TTS syst em. TD-PSOLA is the most popular synthesis algorithm. The al gorithm shows very high quality when F0 modification is limite d. However, the quality degradation due to pitch epoch detection error becomes severe as the F0 modification factor becomes lar ge. On the othe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000